Characterization of Neowestiellopsis persica A1387 (Hapalosiphonaceae) based on the cpcA, psbA, rpoC1, nifH and nifD gene sequences

Background Complex descriptions of new strains of cyanobacteria appear very frequently. The main importance of these descriptions concerns potential new substances that they could synthesise, as well as their different properties as a result of their different ecological niches. The main gene used for these descriptions is 16 S with ITS or whole genome sequencing. Neowestiellopsis persica represents a unique example of the influence of ecology on morphological changes, with almost identical 16 S identity. Although our previously described Neowestiellopsis persica strain A1387 was characterized by 16 S analysis, we used different molecular markers to provide a way to separate strains of this genus that are closely related at the genetic level. Materials and methods In order to conduct an in-depth study, several molecular markers, namely psbA, rpoC1, nifD, nifH and cpcA were sequenced and studied in Neowestiellopsis persica strain A1387. Results The results of the phylogenetic analysis, based on cpcA, showed that the studied strain A 1387 falls into a separate clade than N. persica, indicating that this signature sequence could be a useful molecular marker for phylogenetic separation of similar strains isolated in the future. Conclusions Analysis of strain A1387 based on gene differences confirmed that it is a Neowestiellopsis strain. The morphological changes observed in the previous study could be due to different ecological and cultivation conditions compared to the type species. At the same time, the sequences obtained have increased our understanding of this species and will help in the future to better identify strains belonging to the genus Neowestiellopsis. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-024-02244-z.


Introduction
Identification of the true-branched cyanobacteria/cyanoprokaryota, which traditionally belong to the Hapalosiphon/Stigonematales clade [1], is usually challenging.The strains belonging to this clade have unique morphological characters, which unfortunately are not sufficient for species identification [2].The family Hapalosiphonaceae, to which the genus Neowestiellopsis belongs, is a monophyletic clade.However, some genera within it, such as Westiellopsis, Fischerella and Hapalosiphon, are considered to be polyphyletic [2].
The existence of polyphyletic genera leads to the need to use sufficient different molecular markers to study the closely related species.In some cases, the lack of resolution of traditional genetic markers, mainly the 16 S rRNA, can lead to the need to use several different genes to identify species belonging to these genera.
In the past, the use of different molecular markers such as rpoC1, nifD, nifH, cpcA and psbAhas helped to resolve the problem with closely related species.The rpoC1 gene, which encodes the β-subunit of RNA polymerase, is a more discriminating genetic marker between closely related species [9].This marker was recently used in the study and description of the genus Minunostoc [7] and species Neocylindrospermum variakineticum [10] and Dulcicalothrix alborzica [11].
The psbAgene, an important functional gene, is part of the photosystem II reaction center and encodes photosynthetic D1 proteins [16].Multiple copies of this gene can be found in cyanobacteria /cyanoprokaryota, such as Synechococcus sp.[17].In Nostocales, this gene shows great variability and can be present at 1 to 11 copies [12].Although this molecular marker is not often used for phylogenetic studies, it has been used in studies of species belonging to Aliinostoc [13,14] and Synechococcus [15].The main problem with using this gene as a molecular marker, compared to the results from 16 S rRNA, is the difference in primer specificity.Because of this, the results of community studies may not be comparable [16].
The molybdenum-dependent nitrogenase (nif) structural genes appear to have a single origin in cyanobacteria.The highly conserved genes nifD and nifH encode dinitrogenase reductase, a protein subunit of the nitrogenase complex involved in N 2 fixation.They are thought to have been inherited from a common cyanobacterial ancestor [20].A total of 16 nif genes have been identified in cyanobacteria, forming different operons (nifBSU, nifENXW, nifHDKand nifVZT) [20,21].Common to all N 2 fixers, they are useful for characterizing diazotrophic communities and differentiating cyanobacterial genera [4].These molecular markers have been used in studies focusing on the genera Desmonostoc [16], Nunduva, Kyrtuthrix [17], Crocosphaera, Rippkaea, Zehria [18] and others.The nifD also provides a phylogenetic signal [23] and has been used to elucidate the evolutionary relationships among heterocyte -forming cyanobacteria [24].It also proved useful in distinguishing between two genera of heterocyte-forming cyanobacteria, Nostoc and Anabaena [23], where nifH failed [25].The phycocyanin-encoding operon has perhaps been used in the past to resolve cyanobacterial taxonomy [8].For phylogenetic resolution, conserved coding regions such as cpcB and cpcA were used, while the closely related species were separated by the highly variable intergenic spacer region (IGS) [19,20].These molecular markers have also shown good resolution in distinguishing between freshwater biofilm-forming, planktonic and terrestrial cyanobacteria [20].
They have recently been used in taxonomic studies of the genera Arthrospira [21], Microcystis [22] along with the species Compactonostoc shennongjiaensis [7] and Raphidiopsis curvispora [12].Both cpcAand nifH appear to be more useful for strain discrimination than the commonly used 16 S rRNA gene, which shows low intrageneric variability in many cyanobacteria [7].
The genus Neowestiellopsis, originally described by Kabirnaj et al. [23], was isolated from dried rice fields in Mazandaran, Iran.This genus forms a separate clade when using 16 S rRNA as a molecular marker, which is further supported by the unique shape of folding of secondary structures from 16 to 23 S rRNA sequences.Based on these data, two species were described, Neowestiellopsis persica and Neowestiellopsis bilateralis.Nowruzi et al. [24] also identified strain A 1387 as belonging to N. persica with 100% homology to N. persica SA33.However, significant morphological differences could be identified between these two strains, such as different branching type, lack of biseriate development of filaments, larger cells, presence of akinetes and monocytes in N. persica A1387.
The aim of the present study was to extend the original description of N.persica strain A1387 by sequencing and analyzing thecpcA, rpoC1, psbA, nifH and nifD genes, to obtain a better understanding of strains belonging to the genus Neowestiellopsis.

Cultivation of Neowestiellopsis persica A1387
Neowestiellopsis persica A1387(Hapalosiphonaceae) was purchased from the Cyanobacteria Culture Collection (CCC) and Alborz herbarium at the Science and Research Branch, Islamic Azad University, Tehran, with the accession number A1387.Purified Cultures were maintained in BG11 medium at 28 ± 2ºC with periodic shaking (twice a day).The culture room was illuminated with ca.50-55 µmol photons m − 2 s − 1 with a photoperiod of 14:10 h light: dark cycle [24].

Molecular and sequence analysis
Genomic DNA was isolated from 16 to 18 day-old log phase cultures using the Himedia Ultrasensitive Spin Purification Kit (MB505).The manufacturer's instructions were followed, with the exception of an increased incubation time for the lysis solutions AL and C1, which were set to 60 and 20 min, respectively.DNA fragments within the following genes were amplified using the oligonucleotide primers and PCR reactions listed in Table 1: nifD, nifH, psbA, rpoC1andcpcA.PCR reactions were performed using Bio-Rad reagents with the following PCRconditions and procedure: 25 µl aliquots containing 10-20 ng DNA template, 0.5 µM of each primer, 1.5 mM MgCl 2 , 200 µM dNTPs and 1U/µl Taq DNA polymerase.The PCR profiles for the different genes were carried out according to Table 1.PCR products were checked by electrophoresis on 1% agarose gels (SeaPlaque® GTG®, Cambrex Corporation), using standard protocols.The products were purified directly using the Geneclean® Turbo kit (Qbiogene, MP Biomedicals) and sequenced using the BigDye® Terminator v3.1 cycle sequencing kit (Applied Biosystems, Life Technologies).
The partial sequences were compared with the ones available in the NCBI database (Jun, 2023) using BLASTn.The BLAST X tool (blast.ncbi.nlm.nih.gov/Blast.cgi) was used for psbA, rpoC1, cpcA, nifH and nifD genes.The sequences were annotated for the coding regions using the NCBI ORF Finder and the ExPASY proteomics server.Nucleotide similarities were computed using program SIAS (Sequence Identity and Similarity) [25]: SIAS: Sequence identities and similarities.Available at: http://imed.med.ucm.es/Tools/sias.htmland using the PAM250 matrix.

Phylogenetic analysis
ThepsbA, rpoC1, cpcA, nifH and nifD genes sequences obtained in this study, as well as the best hit sequences (> 94% identity) retrieved from GenBank, were first aligned using MAFFT version 7 [26] with automatic settings for nucleotide sequences.All alignments were visualised using Jaiview [27,28] and then the alignments were used to build maximum likelihood phylogenetic tree for the genes.For this, we used IQ-Tree version 2 [29,30].TIM2e + G4 + F, TIM2e + I + G4 + F, TVMe + I + G4 + F, TIM3 + F + G4 and TIM3 + F + I + G4 models were used as suggested (BIC criterion) after employing model test implemented in IQ-tree for nifD, nifH, psbA, rpoC1 and cpcA-IGS genes respectively.Tree robustness was estimated with the bootstrap value set to percentages using 1000.The program MrBayes version 3.2.7a[27,28,31] were used for calculation of phylogenetic tree for each gene, where the Bayesian inference were considered.The Markov chain Monte Carlo (MCMC) algorithm, using default parameters was run for 10 000 generations with

Target gene/ sequence Sequence 5 × 3´Thermal profile
Reference 2 runs of four incrementally heated chains, starting from random trees and sampling every 10 generations.The first of 25% of the trees were discarded as burn-in and the remaining trees were used to construct a 50% majority rule consensus tree for each gene.

Results
Our previous study, which focused on phylogenetic analyses based on 16 S rRNA sequences, suggested that this strain is genetically N. persica.However, a different morphology (Fig. 1), and the presence of genes responsible for the production of cyanotoxins indicated that this strain could be a different species [24].
When we compared the morphology of the studied strain with Neowestiellopsis persica SA33 (MF066912.1)and N. bilateralis SA16, we found differencesin morphological characterization (Table 2).The branching of Neowestiellopsisbilateralis was found on both sides of the main axis, however it occurred only on one side with our studied strain, more like N. persica.Our strain presented V and T type branching while N. persica and N. bilateralis only had T type branching.In N. persica SA33 biseriate development was observed, with terminal cells of branches tapered toward the apex and the first cell of the branch adjacent to main filament was irregular in Sometimes bilateral branching origins from one (e) or two near cells (f).Sometimes two near cells are separated by a heterocyte (g).Moreover, the studied strain may eventually differentiate a series of spherical, thick-walled cells that are akinetes (h), 6.25 μm length × 3.75-5 μm width.Heterocytes are intercalary (i) and could be found near the branch (j).Sometimes Hormogonia are formed at the end of a branch by one cell (k), and also directly on the main trichomes (l).Reproduction occurs via monocyte formation, which is a spherical cell, 3.5-5.5 μm of diameter (n) shape, although these characteristics were never seen in our studied strain.
In both Neowestiellopsis species, the main filament cells that gave rise to branches had irregular-shaped cells with some being squeezed from both sides, but there were no irregular-shaped cells in the studied strain and in total the mean size of vegetative cells, of both N. bilateralis and N. persica, were smaller than to the studied strain.However, the size of heterocytes in main and in branched filaments for both strains were in the same range.
In our strain, akinetes and monocyte reproductive cells were observed, but these were not reported for the other species of Neowestiellopsis.
In our present study, we focused more on the differences between Neowestiellopsispersica A1387and other strains belonging to the Neowestiellopsis cluster.Our analyses point out that available databases show a lack of sequences for genes other than rpoC1.For this reason, we decided to take a closer look at four more genes besides rpoC1, psbA, nifD, nifH, andcpcA.Concerning these genes, we identified the closest possible sequences from the same cluster as the original Neowestiellopsis strain, or from the strains from the closest clusters that belong to Fischerella, Mastigocladus, Hapalosiphon and Westiellopsis.

Phylogenetic analyses
First, we created a multiple sequence alignment for each of the studied genes using MAFFT.Alignments were visualised as shown in Supplementary Figs.S1-S5 and similarities in the sequences were highlighted.Positions with the lowest level of sequence similarity are not highlighted, and positions showing the highest level of conservation are highlighted in dark blue.Consensus nucleotides for each position in the sequences are shown on the bottom of the multiple sequence alignment.
These alignments were used for the construction of the phylogenetic trees.For phylogenetic trees, the bootstrap value was set to 1000.Phylogenetic trees based on different gene markers are shown in Figs. 2, 3, 4, 5 and 6, circles indicate standard bootstrap support (%).
In the case of the cpcA gene, the trees were built from 24 nucleotide sequences.As shown in Fig. 2, the sequence from the studied strain Neowestiellopsis persica A1387 is closely related to the cpcA gene from Fischerella sp.NIES 2361 (KT832353), Microcystis aeruginosa NPLJ 4 (FJ801046) and Stigonema hormoides (KT832399| with bootstrap support of 92%. For the gene nifD, 48 nucleotide sequences were used to calculate the phylogenetic tree (Fig. 3).Interestingly, the sequence from the studied Neowestiellopsis persica Oblong, mainly in chains, 5.0-6.0 μm broad, 6.5-11.0μm length.A1387 is placed close to the root of the tree branch.This sequence shows close evolutionary relationships with the nifD sequence from Fischerella and Westiellopsis species (bootstrap support 100%), namely Fischerella sp.UTEX 1903 (AY196955), and W. ramosa HPS (KY020126) (Fig. 4).From the same family as nifD, we also analysed another gene, nifH.In this case, we used 54 sequences.As shown in Fig. 4, nifH from Neowestiellopsis persica A1387 closely clusters with nifH sequences from several Fischerella strains (JF923553, KT832452, and KT832456) with high bootstrap support (100%).Another gene we analysed was psbA.We built a phylogenetic tree constructed with psbA that included 70 cyanobacterial sequences of this gene (Fig. 5).The sequence from N. persica A1387 forms a separate branch which clusters with the clade containing Fischerella sp.NIES-3753 (AP017305), with a bootstrap value of 40.Other closely related sequences were observed with Nostoc species (CP003552), in addition to several species of the genus Calothrix, all with high confidence based on the bootstrap values of the branches.

Branching
The last of the genes analysed was rpoCl.For the phylogenetic analysis of this gene, we aligned 32 cyanobacterial sequences and then used this alignment to calculate the phylogenetic tree.The rpoCl sequence from Neowestiellopsis persica A1387 formed a cluster with two more sequences from the genus Neowestiellopsis, namely Neowestiellopsis persica SA33 (MF115984), and Neowestiellopsis bilateralis SA16 (MF115983).Both cases showed strong support with bootstrap values of 100.The cluster contained two other sequences, one from the genus Fischerella (AP018298 and AB074804), and one from the genus Hapalosiphon (EU151909)(Fig.6).Using Bayesian inference, trees for each gene were constructed (Supplement Figure S6).This tree supports the position of strains in the tree without interference, with the exception of the psbA gene, which forms a separate branch with strain Fischerella sp.NIES-3754 (AP017305) and cpcA, where it belongs to a branch with strain Fischerella sp.Cohn (M75599).

Discussion
In the present work, we extended our molecular analyses for Neowestiellopsis persica strain A1387 by using the psbA, rpoC1, nifD, nifH and cpcAgenes, with the aim of adding this data to databases Neowestiellopsis persica in order to help with a better understanding of phylogenetic relationships between species belonging to this genus.The only information for any of these studied genes from the genus Neowestiellopsis is for the rpoC1 gene [23].Regarding N. persica SA16 (MF066911) and N.bilateralis SA23 (MF066912), the closest similarities were with Hapalosiphon hibernicus B2-3-1 (EU151909) and Fischerella muscicola (AB075910), with a 96% similarity for both of these strains [23].For strain A1387, the most similar strains were N. persica SA16 (MF066911) with 99.36% similarity and N.bilateralis S23 (MF066912) with 99.41% similarity.Other closely related sequences within the clade Neowestiellopsis were Fischerella sp.NIES 4106 (AP018298) with 94.74% similarity and Hapalosiphon sp.IAM-M-264 with 94.71% similarity.For the gene psbA the mostsimilar sequence was Fischerella sp.NIES-4106 (AP018298).For gene nifD, the closest strain was Fischerella sp.UTEX 1903 (AY196955) and for nifH the closest strain was Fischerella sp.NQAIF3111 (KJ636982).Regarding similarity to the cpcA gene, the closest strain was Fischerella sp.NIES2361 (KT832393) at 87.13%.Usually, phylogenetic trees based on psbAand nifD genes have relatively similar characteristics [34], although problems with phylogenetic tree construction could be caused by multiple copies of some genes in genomes.For example, multiple copies of the psbAgene can be found in cyanobacteria [35], with nine psbAcopies in Fischerella sp.PCC9605 encoding the G4-D1 protein.Furthermore, phylogenies based on these genes do not correspond with cyanobacterial phylogenies based on 16 S rRNA [36].However, if we want to use this gene for characterizing closely related sequences or strains, these sequences always group together within the psbA based tree.It seems that closely related strains tend to have similar D1 protein complements [37].Furthermore, this gene seems to be suitable for use in comparing communities in similar environments, because the number of psbA D1 gene copies depend on environmental conditions [38].
The rpoC1 gene represents genes that are present as single copies in the genome and this molecular marker is usually more discriminatory towards differentiation at the species level than 16 S rRNA [39].In closely related species, this gene is used for better divergence and for issues at the species level [40].In heterocytous cyanobacteria, it was usually used for a better understanding of relationships between closely related species within the genera Minunostoc [41], Calothrix, Tolypothrix, Scytonema [42] or Anabaena [43].Usually, the phylogenetic trees constructed based on rpoC1 correspond with the phylogeny based on 16 S rRNA.In our case when we compared the phylogenetic tree based on 16 S rRNA [24], N. persica A1387 belonged to a well-defined clade with strains N.persica SA33 (MF066912), Neowestiellopsis sp.KHW5 (MN656995) and Fischerella sp.(AJ 544,076), and with the closest clade belonging to Hapalosiphon sp.SAG2376 (MK953008) and Fischerella ambigua UTEX 1903 (KJ768871).Furthermore,      [23].In the phylogenetic tree, based on rpoC1,the topology of clades is similar, and the formed clades correspond with the topology of the phylogenetic tree based on the 16 S rRNA gene.The nifH and nifD genes are present only in cyanobacteria containing heterocytes and in picocyanobacteria [44].Furthermore, the operon nifHDK is essentially conserved in the genome with minimum translocation and insertions [45].These molecular markers are usually used in diazotrophic communities and in the past have helped to resolve the phylogeny of closely related species of the genera Anabaena, Aphanizomenon and Nostoc [46,47], Trichodesmium [48] with the nifH gene, and Nostoc and Anabaena by the nifD gene [49].
Although thecpcA gene is not suitable for closely related species, it is ideal for multi-locus analyses and identification of strains at the genus level [50].Based on this marker the genera Nodularia [51], Anabaenopsis [52], Aphanizomenon [53], Arthrospira and Microcystis [54][55][56][57] have been previously studied.Similarly, alignment of Westiellopsissp.Ind19 and Hapalosiphon welwitschii Ind21 provided a substantial verification of the placement of monoseriate true branching forms as mentioned by Komárek et al. [58].However, as of now they have all been placed in the family Hapalosiphonaceae and the use of the phycocyanin locus in this study, supports this placement of the true branching forms.Thus it is evident that the cpcBA-IGS locus was robust enough in differentiating the twelve freshwater strains as per taxonomic classification [20].Our study of this gene shows that Neowestiellopsis form a well-established clade.
Based on the 16 S analyses presented by Nowruzi et al. [24], strain A1387 was found to be N. persica.However, differences in morphology, production of cyanotoxins, as well as differences in gene sequence similarities for rpoC1suggest that this strain could possibly be a different species, or at least a different morphotype of this species.For a better understanding of the phylogeny of Neowestiellopsis, more information and sequences, mainly from the genera Neowestiellopsis, Fischerella and Westiellopsis are needed.Not applicable.

Fig. 1
Fig. 1 Morphological characterization of NeowestiellopsisPersica A1387.With increasing age there are significant increases in the number of main and branching filaments terminating in an empty sheath (a).Unilateral T-type branches arise from the main filament (b); erect true branches (with T-type branching) (c) usually unilateral (d).Sometimes bilateral branching origins from one (e) or two near cells (f).Sometimes two near cells are separated by a heterocyte (g).Moreover, the studied strain may eventually differentiate a series of spherical, thick-walled cells that are akinetes (h), 6.25 μm length × 3.75-5 μm width.Heterocytes are intercalary (i) and could be found near the branch (j).Sometimes Hormogonia are formed at the end of a branch by one cell (k), and also directly on the main trichomes (l).Reproduction occurs via monocyte formation, which is a spherical cell, 3.5-5.5 μm of diameter (n) branching T -T-branching and V-V-branching), b HG-hormogonia; c A, akinetes), d heterotrichy that indicates differences in the shape of the cells of the main and secondary branches [+, clear differences; U, uniseriate; B, biseriate], e heterocyst position (Tr, terminal; I, intercalary)

Fig. 2
Fig. 2 Phylogenetic tree constructed from nucleotide sequences of the cpcA gene: Bootstrap values are shown besides each branch, bootstrap values lower than 30 are not shown.The sequence of the studied strain is highlighted in red

Fig. 3 Fig. 5
Fig. 3 Phylogenetic tree constructed from nucleotide sequences of the nifD gene: Bootstrap values are shown besides each branch, bootstrap values lower than 30 are not shown.The sequence of the studied strain is highlighted in red

Fig. 6
Fig. 6 Phylogenetic tree constructed from nucleotide sequences of therpoCl gene: Bootstrap values are shown besides each branch, bootstrap values lower than 30 are not shown.The sequence of the studied strain is highlighted in red.

Fig. 4
Fig. 4 Phylogenetic tree constructed from nucleotide sequences of the nifH gene: Bootstrap values are shown besides each branch, bootstrap values lower than 30 are not shown.The sequence of the studied strain is highlighted in red

Table 2
Morphological observations of the studied strain.The latter was based on previously published photomicrographs

Table 3
Pairwise distance matrix (p-distances, %) of the psbA gene (666 bp) for Neowestiellopsis persica A1387 and closely related strains

Table 7
Pairwise distance matrix (p-distances, %) of the cpcA gene (270 bp) for Neowestiellopsis persica A1387 and closely related strains